Efficient Fine-Grain Synchronization on a Multi-Core Chip Architecture: A Fresh Look
نویسندگان
چکیده
Multi-core chip architectures are becoming mainstream, permitting increasing on-chip parallelism through hardware support for multithreading. Fine-grain synchronization is essential to the effective utilization of the capacity provided by future high-performance multi-core architectures. However, there are also new challenges realizing such fine-grain synchronization in large-scale multi-core chip architectures – such as the IBM Cyclops-64 chip that contains more than 100 processing cores and employs a memory organization with explicitly addressable memory segments instead of data cache. This paper presents a fresh look at the challenges and proposes a scalable solution for fine-grain synchronization that efficiently enforces mutual exclusion and read-after-write data-dependencies between concurrent threads. Using the Cyclops-64 chip architecture as a case study, we illustrate how to use a small Synchronization State Buffer (SSB) associated with each memory bank to accelerate the fine-grain synchronization by recording and managing the states of frequently synchronized data units with modest hardware extensions. We demonstrate the effectiveness and efficiency of the proposed solution. • For mutual exclusion: Using distributed fine-grain locking at each of the memory units, we avoid the unnecessary serialization of operations on different elements of the same concurrent data structure and achieve this goal efficiently. • For read-after-write data-dependencies synchronization: our method encourages the exploration of do-across style of loop-level parallelism where loop-carried data dependencies can often be directly implemented by the application of the fine-grain synchronization operations and the removal of useless barriers. The experimental results demonstrate significant performance gain due to the use of the above fine-grain synchronization solutions.
منابع مشابه
Efficient Synchronization for a Large-scale Multi-core Chip Architecture
Multi-core architectures are becoming mainstream, permitting increasing on-chip parallelism through hardware support for multithreading. Synchronization, especial finegrain synchronization, is essential to the effective utilization of the computational power of high-performance large-scale multi-core architectures. However, designing and implementing fine-grain synchronization in such architect...
متن کاملNear fine grain parallel processing using a multiprocessor with MAPLE
Multi-grain parallelizing scheme is one of effective parallelizing schemes which exploits various level parallelism: coarse-grain(macro-dataflow), medium-grain(loop level parallelizing) and near-fine-grain(statements parallelizing) from a sequential program. A multi-processor ASCA is designed for efficient execution of multi-grain parallelizing program. A processing element called MAPLE are mai...
متن کاملAn Efficient Synchronisation Mechanism for Multi-Core Systems
The use of efficient synchronization mechanisms is crucial for implementing fine grained parallel programs on modern shared cache multi-core architectures. In this paper we study this problem by considering Single-Producer/Single-Consumer (SPSC) coordination using unbounded queues. A novel unbounded SPSC algorithm capable of reducing the row synchronization latency and speeding up Producer-Cons...
متن کاملA Study of Parallel Betweenness Centrality Algorithm on a Manycore Architecture
Large scale graph analysis algorithms–such as those in SCCA2 benchmarks studied in this paper–play an increasingly important role in high performance computing applications. Different from most of traditional scientific computing applications, graph algorithms often show dynamic and irregular computing behavior. It is difficult to attain good performance on large scale conventional parallel arc...
متن کاملThe Elephant and the Mouse: Non-Strict Fine-Grain Synchronization for Many-Core Architectures
A new synchronization mechanism created under the dataflow model of computation was introduced during the late 1970s and called I-Structure. I-Structure exhibited the following important features: (1) it is a dataflow style synchronization, i.e., synchronization only occurs between an I-Structure producer and consumer operations that are accessing the same memory location; (2) it is fine-grain ...
متن کامل